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In the Yule-Simon process, selection of words follows the preferential attachment mechanism, 
resulting in the power-law growth in the cumulative number of individual word occurrences. This is 
derived using mean-field approximation, assuming a continuum limit of both the time and number 
of word occurrences. However, time and word occurrences are inherently discrete in the process, and 
it is natural to assume that the cumulative number of word occurrences has a certain fluctuation 
around the average behavior predicted by the mean-field approximation. We derive the exact and 
approximate forms of the probability distribution of such fluctuation analytically and confirm that 
those probability distributions are well supported by the numerical experiments. 
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The Yule-Simon process is a classical mathematical 
model that describes a branching process in discrete time 
and state space; it was originally introduced by Yule to 
explain the population dynamics of biological species in 
continuous time and discrete state space m and later 
modified by Simon into the discrete time and state model 
[an]. In Simon’s scheme, the process yields a word se¬ 
quence. A word is added to the sequence at every time 
step, where a new word, or vocabulary, is created with 
probability a, whereas with the complementary probabil¬ 
ity 1 — a, or a, one of the existing words in the sequence 
is chosen again. This process is analogous to that of 
book reading, where novel or known words appear one 
after another sequentially. One of the significant results 
of Yule’s and Simon’s works is the derivation of the popu¬ 
lation distribution that follows the power-law form, also 
known as Zipf’s law in the rank-frequency distribution 

m- 

Now following Simon’s scheme, let us denote i as the 
index of distinct words sorted in the ascending order of 
time when they are created. The probability of word i 
being chosen among the existing words is proportional to 
the number of occurrences of word i in the sequence, and 
this is defined as follows: 

P[i,t) =n^{t)/N{t), (1) 

where ni(t) is the cumulative number of occurrences of 
word i until time step t and N{t) is the length of the 
sequence at t, that is, the total number of word occur¬ 
rences until t — N{t) = t from the definition. The name 
of the preferential attachment mechanism derives from 
this proportionality in the word selection, sharing the 
same idea as the well-known urn models [B]. We should 
note that Simon himself assumed rather a weaker con¬ 
dition than that in Eq. Q, which is equivalent to that 
implicitly assumed in Yule’s scheme. Instead, Simon in¬ 
troduced the notion of class, a group of distinct words of 
the same number of occurrences, to be chosen in propor¬ 
tion to the size of the class, that is, the total number of 


word occurrences included in the class; meanwhile, the 
rule determining which word is actually picked up in the 
chosen class is arbitrary. Thus, the probability of the 
class being chosen is defined as follows: 

V{n, t) = nf{n, t)/N{t), (2) 

where n is the cumulative number of word occurrences, 
or the class, and f{n,t) is the number of distinct words 
included in class n at time t. If we adopt the additional 
rule to Eq. (§ that picks up a word uniformly at ran¬ 
dom in the chosen class, it leads to the same result as in 
Eq. 0. We use the term “the Yule-Simon process” to 
refer to Eq. Q, and our study is based on this. 

The Yule-Simon process has been used as an archetype 
of various other dynamic processes such as the Barabasi- 
Albert (BA) graph model [7] , which describes the growth 
of the web, representing a specific case of the process 
when a = 1/2. In the BA graph, the graph grows by 
adding nodes (webpages) to the graph one by one, result¬ 
ing in a certain number of edges (hyperlinks) connected 
to the existing nodes in proportion to their degree, that 
is, the number of edges belonging to the target node. We 
see a direct correspondence between the models; “node” 
and “degree” appearing in the BA graph are paraphrases 
of “word” and “word occurrence,” respectively, in the 
Yule-Simon process [5]. Barabasi and others analyzed 
how the node gathers the number of edges in the evo¬ 
lution and showed that the degree grows in a power-law 
fashion in the continuum limit of time and degree as fol¬ 
lows: 

k,{t) oc {tltf)^l'^, (3) 

where ki (t) is the expected degree of node i at time t and 
ti is the time when node i joined the graph. Following the 
same logic, the expected value of the cumulative number 
of occurrences of word i at time t, denoted by n*{t), is 
derived as follows: 

n* {t + At) = n*{t) + {1 — a)P{i, t)At. 
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Then, via the integral form 


/ 



(1-a) 


dt 

T’ 


(4) 


we obtain 


= (5) 

using the initial condition n*{ti) = 1. The homology 
between Eq. (|^ and Eq. ([^ implies that the BA graph 
is actually a particular case of the Yule-Simon process 
with a = 1/2. 

The mean-field approximation elucidates the expected 
behavior of the increase in the cumulative number 
of word occurrences under the preferential attachment 
mechanism, as shown above. Even so, we can assume 
that the individual word occurrence will deviate from 
the expected value under a certain period of observation; 
there might be words that occur more frequently than 
expected and others that appear less frequently. We can 
likely attribute such individuality to factors such as the 
so-called fitness [5] of each word, environmental contin¬ 
gency, or the inherent dynamics of the system. What 
shape the probability distribution of such fluctuation has 
is an interesting question, since anomalous behavior often 
attracts our interest more than ordinary behavior |10j ; in 
addition, knowing the shape of the distribution function 
might provide a useful theoretical baseline to compare the 
growth of distinct words in, for example, social annota¬ 
tion systems mma and network elements in complex 
networks [13] that joined the system at close points in 
time. 

Based on a similar motivation, Krapivsky and Redner 
investigated the fluctuation of the degree distribution in 


networks, that is, the fluctuation of the numbers of nodes 
that have the same degree [14]. In other examples, spe¬ 
cific scaling laws between the growth rate, that is, the ra¬ 
tio of the sizes of system components at two consecutive 
time points, and its fluctuation have been investigated in 
various social systems such as city size, scientific output, 
human communication, and so on [IMH]- Those works 
focus on the growth fluctuation of the class mentioned 
above, that is, a group of system components that have 
the same size, as a function of each size. In contrast, we 
focus on the fluctuation observed in individuality. 

In the following, first we derive the probability dis¬ 
tribution of the growth fluctuation that the individual 
words exhibit under the preferential attachment mech¬ 
anism analytically. Then, we check the validity of the 
formula through a comparison with the results from nu¬ 
merical experiments. 

Let us denote P{ni{t) = n) as the probability of the 
cumulative number of occurrences of word i at t, denoted 
by ni{t), to be equal to n, and P(ni{t) —>■ n) as the 
probability of ni{t) to become n from n — 1 right at t. 
Introducing t , an elapsed time from ti, and Si = ti + t 
as the time to measure the probabilities, P{ni{si) = n) 
and P(ni{si) —>■ n) can be written recursively as follows: 
Eor n = 1, 


P{ni{si) 


_ r(ti)r(s, - a) 

r{s^)T{u - d) ’ 


( 6 ) 


and for n = 2, 


P{ni{si) —>■ 2 ) 


P{ni{si) = 2 ) 


P{ni{s^ - 1) = l) —— 

Si 1 

_ r(fi)r(sj - 1 - d) 
r(si)r(ti - d) 

Si r Sj —1 

^ P{n,{u) ^ 2) 

U — L t—U 



_T{U)T{si-2a) r(u-l-d) 

“ r(s*)r(t, - d) r(M-2d) 


(7) 


Equation § means that word i is not chosen for r since 
its first appearance. Equation Q means that at a certain 
time point in the interval [ti + 1 : Si], word i is chosen 
only once and after that, it can never be chosen until Si. 


Eurther, for n > 2, the form of the probabilities becomes 
more complicated because it has the term of weighted 
and nested sums of the ratios of Gamma functions in it. 
However, let us write down a few more values one by one: 
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For n = 3, 


P{rii{si) -)■ 3) = P{ni{si - 1) = 2) 


2a 

Si - 1 


-^- 2 r(t,)r(s,-l- 2 a) r(u-l-a) 


r(si)r(ii-a) r(M- 2 a) 


P{ni[si) = 3) = ^ 


u=ti +2 


Si -1 


P{ni{u) 3) 


J-3 


a + a- 


= 2a 


2 r(ti)r(st - 3a) 
r{si)T{ti - a) 


E 

u—ti-\-2 


r(u - 1 - 2a) 


E 


r('(; — 1 — a) 


r(M-3a) r(?;- 2 a) 


( 8 ) 


and for n = 4, 

P{ni{si) 4) = P{ni{si - 1) = 3) 


3a 

Si - 1 


_ g^ 3 r(ii)r(si — 1 — 3a) 


r(si)r(fi - a) 


P{nM) = 4) = 




u—ti +2 

Si-1 


r(M - 1 - 2d) ^ r(z; - 1 - d) 


r(u-3a) r(z;-2a) 


P{n^{u) ^ 4) 


a + a- 


t - 4 


^ g -3 r(fi)r(st - 4a) 


r(si)r(fi - a) 




r(u-l-3d) 

r(w — 4a) 


r(z; — 1 — 2 a) r(i(; — 1 — a) 

r(i’-3d) T(w-2a) 


(9) 


Looking at Eqs. (U), 0,0 , and 0 deliberately, we can 
inductively infer their general form as follows: 


P{ni{si) = n) = 
rr(t.)r(^.-g) 


if n = 1 , 


The term Sn{4>) is defined as the following recursive func¬ 
tion with a depth of n — 1 : 

Sn{^) = 

( r( 0 -i-a) if n - 2 

) r( 0 - 2 a^ n n — 

S r(<p-l-(n-l)a) y^0-l c (^) [f n > 2 

[ T{(j)-na) Z^i/>=ti+ra- 2 11 n ^ Z. 


( 11 ) 


This is the exact form of the probability distribution 
wherein the cumulative number of occurrences of word i 
at time Si will be n. For sufficiently large values of ti and 
Si , these equations can be asymptotically transformed as 
follows: 


P{ni{si) = n) 


t?s; 


if n = 1 , 


(n - l)!d"-itf s-”'^ El=U+n-l SnW if ^ > 1 > 

( 12 ) 


and 


5„(<^) 


if n = 2 , 

'Et=u+n -2 Sn-iW if n > 2 , 


(13) 


r(s.)r(t.-a) 

in - l)!«"~^ ^r^or(r-’"a? EY+u-i ^<1’) H n > 1. 

( 10 ) 


where we use the asymptotic approximation of the ratio 
of Gamma functions for large ti\ limt_>oo r(t — a)/F(t) ^ 
t~°-. Equations (12 1 and (13) represent one of the prin¬ 


cipal results of this article. 

If a —0, or d —>■ 1, all weighting factors in 


Eq. (13), or all ratios of Gamma functions in Eq. (11), 


become exactly equal to 1. Gonsequently, we obtain a 


specific value of the sum part of Eqs. (10) and (12) as 
follows: 


E Sn-li(l>) = 

rv—^ 




{t -n + 2) 
a->o (n — 1 )! 


n—1 


(14) 


which is the volume of an {n — l)-dimensional triangular 
pyramid where all of the edges aligned to a correspond¬ 
ing basis vector have the length r — n -|- 2. Substituting 


Eq. (14) into Eq. (12), we obtain a relatively simple form. 


as follows: 

P{ni{si) = n) tis“"(r - n -|- 2)”“b (15) 

Alternatively, in the case of larger a such as 1/2 in 
the BA graph, it is unclear whether a simple form like 
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Eq. (15) is available, so that we have to numerically cal¬ 


culate Eqs. (12) and (13) directly, if needed. Practically, 


if we calculate all terms in the nested sum naively, it re¬ 
quires approximately r" operations, and such a large cal¬ 
culation will fail easily. Once we calculate any of Sm{4>) 
(to starts from two), storing and reusing the values asso¬ 
ciating with the pair of to and </> reduces the total amount 
of the calculation drastically, and will make the calcula¬ 
tion feasible. 

The following discussion is based on Eq. ( [T^ , the par¬ 
ticular form for a sufficiently small a. What we want 
to know eventually is the scale of the deviation of the 
cumulative number of individual word occurrences from 
the expected value, the core question of this study. The 
absolute size of the deviation depends on ti as well as 
r; Eq. (§ expresses that the cumulative number of word 
occurrences increases more slowly with a larger and 
therefore, the size of the deviation of such words is sup¬ 
posed to be relatively smaller than that of a smaller ti 
if they use the same r. Thus, the size of the deviation 
should be normalized depending on ti using different val¬ 
ues of T. Now we introduce a scale factor A as follows: 


Sj — ti I Ty — ^ti . 


(16) 


Here r, the observation period of the deviation, varies 
word by word, and A is constant for every word and 


greater than one by definition. Substituting Eq. (16) into 
Eq. § , we obtain: 




= A 


l-a 


= A. 

a—>-0 


(17) 


This temporally normalized expected value of the cumu¬ 
lative number of word occurrences. A, is used as a refer¬ 
ence value to measure the scale of the deviation for each 


word. Replacing n in Eq. (15) with xX, that is, x times 


of the reference value, we obtain: 
P{ni{si) = x\) 


Ol—¥Q 

= A-i 


ii{Xti) ^^{(A — V)ti — xX -f 2} 

xA — 1 


x\— 1 


^ 1 x-2/X 

A ti 


(18) 


The idea of x, the scale of the deviation, is depicted in 
Fig. [3 For a large majority of words, supposing ti ^ 
x ~ 1, Eq. (181 is approximated as: 


P{ni{si) = xX) 


1 


a—>0 A — 1 


1 

A 


(19) 


which is independent of ti, that is, independent of the 
word. Hence, this formula represents the probability dis¬ 
tribution of the fluctuation for all words. This concise 
relationship is the other principal result of this article. 


Equation (19) clearly shows that the probability distri¬ 


bution of the deviation scale decays exponentially. 



FIG. 1. A diagram of the relationships between the variables, 
depicting the growth of the cumulative number of word oc¬ 
currences. 



FIG. 2. The rank-frequency distribution. Black circles and 
dotted lines show the simulation results and theoretical curves 
proportional to [word rank]^““, respectively. 


We confirm that the general form (121 and the partic¬ 
ular form for a sufficiently small a (19) well predict the 
actual behavior of the growth fluctuation in the cumula¬ 
tive number of word occurrences. 

First, we ran the numerical simulation of the Yule- 
Simon process for different a values of 0.01, 0.1, and 0.5, 
where the total number of word occurrences is 10^; conse¬ 
quently, the final vocabulary sizes are approximately 10®, 
10®, and 5 x 10®, respectively. Figure [3 shows the rank- 
frequency distribution for each a value, and we see that 
Zipf’s law actually holds in every case with the power ex¬ 
ponent 1 — a predicted by the model. We also show three 
typical patterns of the growth of the cumulative number 
of word occurrences, especially in the case of a = 0.1: 
The word occurrence of the three sampled words (89th, 
90th, and 91st) increases (A) following, (B) exceeding, 
and (C) falling behind the expected growth curve, respec¬ 
tively (Fig. [3). These three words are created at a close 
time point, however, exhibit differing growth courses. 
This word-by-word fluctuation is what we have been try¬ 
ing to explain in this study. 

Using the simulation result, we measured the proba¬ 
bility distribution of the scale of the deviation from the 
reference value for different A values of 2, 5, and 10. To 
calculate the actual values of Eq. (12), we used the same 
values of U in the simulation. The results are shown in 
Fig.i for all a and A values, the simulation results ex¬ 
hibit a good match with the general solution, and we 
conclude that our inductive derivation of Eq. (12) and 
Eq. (101 is valid. In addition, we found that the Simula- 
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FIG. 3. The growth of the cumulative number of word occur¬ 
rences of three sampled words created (A) 89th, (B) 90th, and 
(C) 91st in the case of a = 0.1. Solid and dotted lines show the 
actual growth curves and the corresponding expected growth 
curves, respectively. 


TABLE I. Fitted parameters for the simulation results for 
(aS) exp{—a{x — S)), where 5 is a given interval in the calcula¬ 
tion of the distribution function and corresponds to x values 
of the far-left white circles in Fig. 


a 

A 

6 

a 

Std. Err. 

0.01 

2 

0.501187 

0.999232 

0.0564 

0.01 

5 

0.199526 

1.02095 

0.01522 

0.01 

10 

0.102329 

0.988124 

0.008821 

0.1 

2 

0.524807 

1.02668 

0.05026 

0.1 

5 

0.234423 

0.998086 

0.01758 

0.1 

10 

0.125893 

0.995484 

0.008675 

0.5 

2 

0.691831 

1.03491 

0.07246 

0.5 

5 

0.446684 

1.00143 

0.04125 

0.5 

10 

0.316228 

0.996059 

0.02533 


tion results exhibit good fit with an exponential function 
with an identical characteristic scale of approximately 1; 
the fitted parameters related to the simulation results are 
shown in Table HI This result seems to share the same re¬ 
lationship with Eq. (19), which can be transformed into 
the form of Aexp(—x) in the asymptotic limit of large 
A. In this study, we keep the further discussion of this 
aspect on hold. For small a values, we also see a good 
match between the particular solution and the other re¬ 
sults; meanwhile, the mismatch between them increases 
for large a values. This is consistent with our assump¬ 
tion concerning asymptotic behavior of the particular so¬ 
lution. 


In summary, we derived the probability distribution of 
the fluctuation in the growth of the cumulative number 
of individual word occurrences under the preferential at¬ 
tachment mechanism, based on the Yule-Simon process. 
The distribution function was represented by the partic¬ 
ular form for a sufficiently small a, the creation rate of 
new vocabulary, that shows exponential decay with an 
increasing deviation scale. We also obtained the general 
form of the probability distribution of word occurrences 
and showed numerically that the solution follows the ex¬ 
ponential decay in the growth fluctuation. We confirmed 
that the theoretical solutions and the simulation results 
matched well, concluding that our inductive derivation 


seems suitable. 

The idea of the growth fluctuation in the preferential 
attachment dynamics focused on this study and its solu¬ 
tion raise further questions, as follows: 

1. The BA graph was introduced to explain the 
growth of the web; do webpages or websites ac¬ 
tually exhibit exponential decay in the fluctuation 
of their individual growth? 

The fact that only weak correlation between the 
size of a website and its age exists [18] implies 
the existence of signihcant individuality that might 
cause a deviation from the theoretical expectation. 

2. Alternatively, do we find any phenomena that do 
not follow our result while showing the same pop¬ 
ulation distribution, such as Zipf’s law? 

This question is presumably related to the discus¬ 
sion on the scaling laws referred to previously |15F 
HTj. In addition, it is a good reminder that in¬ 
corporating the fitness function into the dynamics 
enables us to tune the individual growth rates; how¬ 
ever, this distorts even the population distribution 

m- 

3. Following from the previous question and based on 
Simon’s derivation, which ensures the power-law 
population distribution, what form of the distribu¬ 
tion function of the fluctuation can be derived if we 
use another rule in picking up a word from the class 
other than the uniformly random selection adopted 
here? 

We sincerely express our gratitude to T. Ikegami, 
M. Oka, and K. Sato for many fruitful discussions and 
suggestions. 
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